Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
نویسندگان
چکیده
For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r=0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r=0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملAutomatic Rating of Hoarseness by Text-based Cepstral and Prosodic Evaluation
The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3± 16.8 years) uttere...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملMulti-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification
As automatic speech processing has matured, research attention has expanded to paralinguistic speech problems that aim to detect beyond-the-words information. This paper focuses on the identification of seven speaker trait categories from the Interspeech Speaker Trait Challenge: likeability, intelligibility, openness, conscientiousness, extraversion, agreeableness, and neuroticism. Our approach...
متن کاملEvaluation of Tracheoesophageal Substitute Voices Using Prosodic Features
Tracheoesophageal (TE) speech is a possibility to restore the ability to speak after laryngectomy, i.e. after the removal of the larynx. TE speech often shows low audibility and intelligibility which makes it a challenge for the patients to communicate. In speech rehabilitation the patient’s voice quality has to be evaluated. As no objective classification means exists until now and an automati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011